Metabarcoding and Metagenomics — Latest Matching Preprints

1

Complementary Insights from Environmental DNA and Environmental RNA Metabarcoding for Marine Biodiversity Assessment Around San Andres Island, Colombia

Bedingfield, S. K.; Vanegas Moreno, C.; More, A. F.

2026-06-08 genetics 10.64898/2026.06.03.730006 medRxiv

Top 0.1%

18.5%

Show abstract

Environmental DNA (eDNA) metabarcoding has become a cornerstone of marine biodiversity monitoring, yet it recovers genetic material irrespective of organism viability and may therefore conflate historical and contemporary community signals. Environmental RNA (eRNA), derived from less stable ribonucleic acid, is hypothesized to be biased toward metabolically active organisms and may provide a more temporally resolved snapshot of living communities. Here we present a paired eDNA/eRNA metabarcoding comparison across a tropical marine seascape, analyzing 19 co-sampled sites spanning coral reefs, mangroves, a seagrass bed, shipwrecks, a cenote, and coastal infrastructure around San Andres Island, Colombia. To our knowledge this is the first in situ, ecosystem-scale paired eDNA/eRNA survey of the broad eukaryotic community across multiple natural habitat types in a tropical marine system, extending mesocosm and freshwater work (e.g., Giroux et al., 2022) to a field setting. Using COI-region amplicon sequencing processed by NatureMetrics, we recovered 1,944 operational taxonomic units (OTUs) across the 19 paired sites. Of these, 1,015 (52.2%) were detected by both approaches, 305 (15.7%) were unique to eDNA, and 624 (32.1%) were unique to eRNA. The eRNA-unique fraction was taxonomically enriched for groups including diatoms (class Bacillariophyceae, phylum Ochrophyta), ciliates, and other protists. Paired Wilcoxon signed-rank tests showed that eRNA recovered significantly higher OTU richness (median 239 vs. 207; W = 36, p = 0.016) and Shannon diversity (median 3.64 vs. 3.38; W = 40, p = 0.026) than eDNA. The mean per-site Jaccard similarity between paired samples was 0.40, indicating substantial turnover in the rare-taxon composition recovered by each method. Principal coordinates analysis of Bray-Curtis dissimilarity showed that habitat type structured abundance-weighted community composition (PERMANOVA F = 2.49, p = 0.001) whereas molecular method did not (F = 1.37, p = 0.107). A PERMDISP test found homogeneous multivariate dispersion between methods (F = 0.01, p = 0.92), reinforcing the absence of a method effect, but significant dispersion heterogeneity among habitats (F = 24.0, p < 0.01), so the habitat result is interpreted with caution. Indicator species analysis identified 73 OTUs significantly associated with one template: eDNA indicators were dominated by dinoflagellates (Dinophyceae) and eRNA indicators by diatoms (Bacillariophyceae) and fungi, consistent with an eRNA bias toward metabolically active microbial eukaryotes. A read-weighted overlap analysis showed that although eRNA-unique OTUs outnumbered eDNA-unique OTUs roughly two to one, the large majority of reads (>95%) fell in shared OTUs, so method-unique detections are predominantly rare taxa. We discuss the complementary value of eRNA for marine monitoring, with the seagrass habitat -- where eRNA reduced masking by terrestrial plant material -- as the clearest use case, and propose, rather than prescribe, the integration of eRNA into routine programs.

2

Metabarcoding replicate detection frequency tracks ddPCR copy number for cod and herring eDNA in ancient marine sediments

Banos Lara, E.; Holman, L. E.; Knudsen, S. W.; Bohmann, K.

2026-07-08 genetics 10.64898/2026.07.03.736335 medRxiv

Top 0.1%

7.4%

Show abstract

1. Detecting environmental DNA (eDNA) from rare or low-abundance aquatic species remains a major challenge, particularly when it is highly degraded, present at low concentrations, and dominated by DNA from non-target taxa. These challenges are further amplified in sedimentary ancient DNA (sedaDNA) studies, where thousands of years can degrade eDNA further, making the detection and quantitative interpretation of weak biological signals difficult. 2. Metabarcoding is commonly used to produce high-throughput community-level data from eDNA but is inherently compositional and influenced by amplification biases. Nonetheless, metabarcoding read abundance or PCR replicate detection frequency are increasingly used as proxies for relative DNA concentration, but their quantitative interpretation has rarely been evaluated against independent measures of absolute DNA abundance. 3. We used droplet digital PCR (ddPCR) to quantify mitochondrial DNA from Atlantic cod (Gadus morhua) and Atlantic herring (Clupea harengus) in 136 ancient eDNA extracts from Icelandic marine sediment cores spanning the last three millennia. We compared ddPCR copy number estimates with metabarcoding (18S) derived relative abundance and detection frequency, and evaluated whether temporal DNA trends corresponded with proxy reconstructed sea surface temperature (SST) variability. 4. We found that ddPCR-measured fish sedaDNA abundance was positively correlated with the proportion of metabarcoding PCR replicates for both Atlantic cod and Atlantic herring. Moreover, temporal trends in Atlantic herring DNA abundance were consistent with proxy reconstructed SST variability, supporting the ecological relevance of the molecular signal. 5. Overall, our results show that ddPCR-derived DNA concentrations and metabarcoding PCR replicate detection frequency capture consistent patterns in low-abundance fish sedaDNA from marine sediments. The observed agreement between approaches supports the use of PCR replicate detection frequency as a semi-quantitative proxy for low-abundance sedaDNA.

3

TRIDENT (Taxonomic Resolution and IDentification using Environmental dNa Traces): An Optimized Algorithm for Vertebrate Taxonomic Assignments in eDNA Metabarcoding, Integrating Molecular, Taxonomic, and Ecological Criteria

Haderle, R.; Jung, G.; Riou, M.; Ung, V.; Jung, J.-L.

2026-07-09 molecular biology 10.64898/2026.06.29.735257 medRxiv

Top 0.1%

5.5%

Show abstract

Environmental DNA (eDNA) metabarcoding has become a powerful approach for large-scale biodiversity assessment, yet taxonomic assignment remains one of its most critical error-prone steps. Current bioinformatic pipelines rely on molecular similarity searches against reference databases, but assignment accuracy is constrained not only by short marker length and database incompleteness, but also by fundamental limitations, including recent species radiations, incomplete lineage sorting, introgression, NUMTs, and the imperfect correspondence between genetic variation and species boundaries. Here, we present TRIDENT (Taxonomic Resolution and IDentification using Environmental dNa Traces), an automated and simple protocol designed to improve taxonomic assignments in eDNA metabarcoding. Initially developed for marine vertebrates, TRIDENT may be used with any barcode and integrates three complementary sources of evidence: molecular similarity (NCBI/GenBank and BOLD), curated taxonomic information (WoRMS), and ecological plausibility derived from biogeographic occurrence data (GBIF). The workflow sequentially constructs candidate taxon lists based on sequence similarity, expands them through taxonomic hierarchies, and filters them using spatial occurrence constraints. It further identifies possible taxa lacking reference barcodes and evaluates their plausibility through CO1-based similarity if data exist in BOLD. TRIDENT has been implemented as a source-available Python tool and tested using empirical eDNA datasets from marine vertebrates as well as simulated communities. Results demonstrate that the tool produces taxonomic assignments consistent with expert manual curation while substantially reducing processing time and attention errors caused by manual processing of large datasets. By combining molecular, taxonomic, and ecological criteria within a single framework, TRIDENT improves transparency and reproducibility and provides a robust and flexible solution strengthening confidence in taxonomic identifications in eDNA-based biodiversity assessments.

4

Tracing the intruders: a global appraisal of marine invasive species detection through DNA-based approaches

Duarte, S.; Costa, F.

2026-05-07 ecology 10.64898/2026.05.05.722998 medRxiv

Top 0.1%

5.4%

Show abstract

Early detection and monitoring of non-indigenous species (NIS) is crucial to prevent their establishment and to reduce ecological and economic impacts in coastal ecosystems. Traditional monitoring approaches, which rely largely on morphological identification of collected organisms, are often time-consuming and may fail to detect species that occur at low abundance, are morphologically cryptic, or are present in the form of inconspicuous life stages. DNA-based approaches, particularly those resorting to environmental DNA, have demonstrated high aptitude for biodiversity monitoring and biosecurity surveillance. By examining the genetic material from bulk community samples or released into the environment, DNA-based approaches enable the detection of species without the need for direct observation, thereby increasing detection sensitivity and expanding the scope of monitoring programs. Despite the rapid growth of its employment in marine monitoring, a global synthesis of the status and trends of DNA-based approaches for detecting NIS in this environment has been lacking. Here, we present such synthesis, based on 146 published studies employing DNA for NIS detections in coastal environments. Two main methodological approaches were used across the reviewed studies, namely DNA metabarcoding which was applied in 49% of studies, closely followed by targeted single-species PCR assays, used in 42% of the studies. A smaller proportion of studies (10%) combined both approaches, integrating broad community screening with targeted detection to improve surveillance efficiency. Globally, 752 NIS were detected across disparate taxonomic groups, with metazoans representing the largest proportion of detections (464 species), followed by Chromista (210 species) and Plantae (77 species). Among these, the most frequently detected taxonomic groups included Dinophyceae (Dinoflagellata), Teleostei (Chordata), Florideophyceae (Rodophyta), Polychaeta (Annelida), Copepoda and Malacostraca (Arthropoda), and Ascidiacea (Chordata). At the species level, several well-known marine invaders were recurrently reported, including Bugula neritina (Linnaeus, 1758), Styela plicata (Lesueur, 1823), Acartia (Acanthacartia) tonsa Dana, 1849-1852, and Botryllus schlosseri (Pallas, 1766), highlighting the ability of DNA approaches to detect widespread and established invaders across different regions. The mitochondrial cytochrome c oxidase subunit I (COI) gene was the most widely used genetic marker, reflecting its broad taxonomic coverage and extensive representation in reference databases, particularly for targeting Metazoa. Ribosomal RNA genes, particularly 18S and 16S rRNA gene markers, were also frequently employed to target a wider range of eukaryotic taxa. Regarding sampled substrates, water was by far the most analyzed substrate, followed by zooplankton and biofouling communities collected from man-made structures. Notably, approximately 31% of all NIS detections reported in the reviewed studies constituted new regional records. These results highlight the potential of eDNA for coastal monitoring but also underline important limitations. Persistent geographical, taxonomic, and methodological biases can affect detection outcomes, and reliance on single sample types or markers may increase false negatives - particularly critical for NIS early detection. Therefore, multi-marker and multi-substrate approaches are essential to improve detection reliability and support effective biosecurity strategies. As reference databases continue to expand and methodological protocols become increasingly standardized, DNA-based monitoring is likely to play a central role in future management and surveillance of biological invasions in coastal ecosystems. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=133 SRC="FIGDIR/small/722998v1_ufig1.gif" ALT="Figure 1"> View larger version (75K): org.highwire.dtl.DTLVardef@17948b1org.highwire.dtl.DTLVardef@193832dorg.highwire.dtl.DTLVardef@189033dorg.highwire.dtl.DTLVardef@33cddf_HPS_FORMAT_FIGEXP M_FIG C_FIG

5

First genetic detection and ongoing eDNA monitoring of the golden mussel (Limnoperna fortunei) in California

Stinson, S. A.; Fiske, A.; Funk, E. C.; Kulig, E.; Brown, S.; Gille, D.; Schreier, A.; Sanders, L.; Nagarajan, R. P.; Barney, B.; Baerwald, M.

2026-06-23 genetics 10.64898/2026.06.18.733028 medRxiv

Top 0.1%

5.4%

Show abstract

Here, we report the first genetic confirmation of golden mussels (Limnoperna fortunei) in North America, and the subsequent development, optimization, and deployment of golden mussel eDNA monitoring procedures. Aquatic species invasions are economically costly, disrupt ecosystem functionality, and impact native aquatic communities. Early detection of new invasive species enables rapid response via implementation of effective eradication or control measures and is key for reducing harmful outcomes. Initial species detection and taxonomic identification can be aided by genetic methods that have high detection sensitivity and accuracy. Genetic methods such as environmental DNA (eDNA) sampling can be used to detect invasive species before they become established in new systems, providing an early alert system to inform resource managers. Golden mussels were first detected in North America in October 2024 near the Port of Stockton in the San Francisco Estuary (SFE). The SFE is particularly vulnerable to invasion due to the access and connectivity provided by the presence of engineering infrastructure and shipping lanes. Collaborative efforts between public agencies and academic institutions are underway to develop a coordinated detection and response plan. Early detection followed by a rapid response is the best defense against prolific invasive species, such as the golden mussel.

6

A Genetic Method for Distinguishing Cryptic Pocillopora Species in French Polynesia without Sequencing

Cohn, F. M.; Johnston, E.; Burgess, S.; Sims, J. A.; Layagala, K.; Harnay, P.; Putnam, H. M.; Correa, A. M. S.

2026-04-25 molecular biology 10.64898/2026.04.25.720756 medRxiv

Top 0.1%

3.4%

Show abstract

Pocillopora is a widespread, dominant reef-building coral genus in the Indo-Pacific that exhibits high morphological similarity and plasticity. Given this, genetic tools are needed to robustly identify Pocillopora individuals to the species level. Quick and accurate identification approaches for Pocillopora species are critical to estimating biodiversity patterns under current and future environmental challenges. In recent years, the mitochondrial open reading frame (mtORF) and a histone region (PocHistone) have been validated using genome-wide data to become the most widely used species-level markers for Pocillopora. However, Sanger sequencing of a large number of samples can be prohibitively expensive and sequencing facilities are not always readily available. Therefore, we present restriction fragment length polymorphism (RFLP) digests here that identify the six species of Pocillopora (P. acuta, P. cf. effusa, P. grandis, P. meandrina, P. tuahiniensis, and P. verrucosa) found in French Polynesia, without sequencing. In uninformed validation tests (in silico and in vitro), our protocol identified each Pocillopora species with 100% accuracy. Given their cost-effective, rapid nature, the tailoring of additional RFLP digest protocols to identify cryptic coral species in reef regions around the world will support foundational reef science, conservation and restoration initiatives.

7

The Leray-XT COI primer pair is not suitable for observing ciliates and radiolarians

Ewers, I.; MAUVISSEAU, Q.; Jamy, M.; Rueckert, S.; Mahe, F.; Dunthorn, M. E.

2026-05-02 microbiology 10.64898/2026.04.30.721870 medRxiv

Top 0.1%

3.2%

Show abstract

The Leray-XT primer pair has been widely used to amplify the mitochondrial cytochrome c oxidase subunit I (COI) gene from animals. In some marine metabarcoding studies, protists have also been amplified and sequenced using these primers. Here, we ask if the Leray-XT COI primer pair is suitable for observing ciliates and radiolarians, which are numerically and ecologically important components of marine protistan communities. We show that while there are sufficient COI reference sequences for ciliates in NCBI for taxonomic assignments, there are currently only two COI reference sequences for radiolarians. Using in-silico analyses, we additionally show that while the reverse primer Leray-XT primer can bind and potentially amplify both ciliates and radiolarians, the forward primer cannot bind to either taxon. These results show that the Leray-XT primer pair is not suitable for observing ciliates and radiolarians, although it may be useful for observing other marine protistan taxa.

8

Biodiversity effects of beaver activity in a semi-natural enclosure revealed by eDNA

Hanfling, B.; Griffiths, N. P.; Macarthur, J. A.; Morrisey, B.; Svobodova, D.; Pritchard, V. L.; Tree, A.; Gaywood, M. J.

2026-05-16 ecology 10.64898/2026.05.15.725411 medRxiv

Top 0.1%

2.4%

Show abstract

O_LIEnvironmental DNA (eDNA) metabarcoding is an emerging tool for biodiversity assessment in freshwater systems, offering high-resolution insights into community composition. Here, we apply eDNA metabarcoding to evaluate the ecological impacts of Eurasian beaver (Castor fiber) activity within a seminatural enclosure in the Scottish Highlands. C_LIO_LIWe collected seasonal water samples from nine sites, six influenced by beaver dams and three control sites with no evidence of beaver engineering, across a 40-hectare enclosure. Samples were analysed for vertebrate and macroinvertebrate diversity using established 12S and COI markers. C_LIO_LIVertebrate alpha diversity did not differ significantly between beaver and control sites, likely reflecting the small spatial scale and low species richness of upland Scottish streams. However, community composition differed significantly between treatments, especially for fish (PERMANOVA, R2 = 0.55, P < 0.001), with beaver-influenced sites dominated by three-spined stickleback and control sites by brown trout. Macroinvertebrate communities showed a 78% increase in gamma diversity in beaver-modified habitats relative to controls. Species composition varied strongly with beaver presence (PERMANOVA, R2 = 0.29, P < 0.001), likely due to the creation of lentic-lotic mosaics and associated microhabitat diversity. Seasonal variation was significant in both taxonomic groups, with the lowest species richness and highest community dispersion observed in summer, probably reflecting hydrological and temperature-driven dynamics in eDNA production and transport. C_LIO_LIOur findings reinforce previous evidence that beaver dam-building activity enhances beta diversity in headwater systems. Additionally, we demonstrate that eDNA metabarcoding is a sensitive method for detecting spatial patterns in freshwater biodiversity associated with these activities at scales ranging from tens to hundreds of meters. These approaches could inform future monitoring strategies aligned with landscape-scale beaver management and reintroductions. C_LI

9

Let the prey speak: Using PNA clamps to silence predator DNA in marine faecal diet studies

Polanowski, A. M.; Suter, L.; Deagle, B. E.; McInnes, J. C.

2026-07-08 molecular biology 10.64898/2026.06.22.733645 medRxiv

Top 0.1%

2.1%

Show abstract

DNA metabarcoding of faeces is a powerful, non-invasive method for assessing predator diets. However, when studying the diet of generalist predators, broad PCR primers are used to amplify the wide range of potential prey species and metabarcoding outputs are often dominated by sequences from the predator. While blocking primers can be used to reduce PCR amplification of predator DNA, they frequently cause partial predator suppression and unintended prey blocking. Peptide nucleic acid (PNA) clamps, offer a promising, underutilised alternative by binding strongly and selectively to predator DNA to block its PCR amplification. In this study we designed and validated a novel PNA clamp targeting the 18S rRNA gene to suppress bird and mammal predator DNA in dietary samples. We tested this clamp on tissue mixtures and faecal samples from three seabird and two seal species across temperate, subantarctic, and Antarctic regions. The PNA clamp substantially increased the proportion of prey reads recovered while maintaining consistent prey community composition across all predator species. Our results demonstrate not only the general effectiveness of PNA clamps over standard blocking primers, but also provide a powerful, broadly applicable new tool to improve the accuracy in DNA diet metabarcoding studies.

10

Full-length COI barcodes improve eDNA metabarcoding data denoising relative to mini-barcodes

Eisele, M. H.; Varusk, S.; Sammet, K.; Hakimzadeh, A.; Metsoja, M.; Tedersoo, L.; Alwutayd, K. M.; Arribas, P.; Andujar, C.; Emerson, B. C.; Anslan, S.

2026-07-03 ecology 10.64898/2026.07.03.736260 medRxiv

Top 0.1%

2.0%

Show abstract

Animal COI (mitochondrial cytochrome oxidase I) metabarcoding of environmental DNA (eDNA) is increasingly used to assess biodiversity in complex substrates such as soil. However, due to read-length constraints of second-generation sequencing platforms, mini-barcodes have been used instead of the full barcode region. Long-read sequencing technologies now enable the recovery of full-length barcode sequences, and are more commonly applied for studying microbes, but their use for metabarcoding the full-length standard COI barcoding region in animals remains limited. In this study, we compared three COI amplicon sets -- 313 bp, 660 bp, and 1,256 bp -- amplified from soil eDNA samples and sequenced using Illumina and PacBio platforms to evaluate their overall concurrence, the effectiveness of identifying nuclear mitochondrial DNA segments (NUMTs) and chimeras, as well as their respective taxonomic resolution. The long-read datasets exhibited a higher identification rate of NUMTs and true chimeras, suggesting that longer sequences improve the detection of noise in COI metabarcoding data, thereby reducing the occurrence of spurious taxa. Taxonomy assignment confidence was similar between the 313 bp and 660 bp datasets, whereas extending the amplicon beyond the standard COI barcode region (1,256 bp) reduced confidence, likely because longer reads extend into regions poorly represented in barcode reference databases. Despite substantially lower sequencing depth in the 660 bp dataset, per-sample OTU richness did not differ significantly from that recovered with the Illumina 313 bp amplicon set. Similarly, the relationships between samples were strongly correlated across the detected OTU communities, indicating consistent ecological interpretations between short and long amplicons. We conclude that the standard ~658 bp COI barcode is an optimal marker for soil animal metabarcoding from eDNA, balancing target recovery, artifact detection, taxonomic assignment and ecological interpretability. As COI eDNA metabarcoding becomes increasingly used in biodiversity assessment and is increasingly adopted in large-scale monitoring initiatives, this study provides methodological guidance for improving the robustness of soil animal community biomonitoring.

11

Benchmarking full-length ITS metabarcoding across Illumina 2x500, PacBio, and Oxford Nanopore sequencing using mock and soil communities

Tedersoo, L.; Prous, M.; Chen, M.; Anslan, S.; Saar, I.; Dubois, B.; Mikryukov, V.

2026-05-21 bioinformatics 10.64898/2026.05.20.726443 medRxiv

Top 0.1%

1.9%

Show abstract

Metabarcoding is a powerful tool for biodiversity comparisons, where standard-size DNA barcodes (>500 bases) offer better taxonomic resolution than shorter ones. Still, the choice of sequencing platforms and bioinformatics pipelines may strongly affect inferred diversity due to various technical biases. We assessed the relative performance of Illumina MiSeq i100 (2x500 paired-end), PacBio Revio and Oxford Nanopore MinION sequencing and bioinformatics pipelines, using full-length ITS amplicon sequencing datasets from a 103-species mock community and 45 composite soil samples. Despite numerous low-quality reads, PacBio yielded the lowest overall error rate and highest number of taxa. Illumina revealed the highest proportion of chimeric and index-switched reads, along with a strong bias towards shorter amplicons. MinION data analysed using PRONAME and Minovar - a bioinformatics pipeline presented here - had the largest proportion of low-quality data, and rare taxa were lost during data filtering and read polishing steps. Although Minovar enabled amplicon sequence variant (ASV) level precision for common taxa, we recommend clustering ASVs into OTUs. For PacBio, standard filtering approaches outperformed the ASV approach because they retained rare taxa. For Illumina, a stringent ASV approach or removal of rare OTUs would limit artefacts. Across all platforms, excess PCR cycles promoted chimeric and low-quality reads and lost quantitativity in biodiversity assessments. With moderate differences in effect sizes, all analytical approaches supported the conclusion that sampling design determines how we see soil biodiversity responses to land use. For biodiversity surveys based on the full-length ITS metabarcoding, we recommend using PacBio sequencing with standard, non-ASV pipelines.

12

Molecular Star Gazing: Development and Validation of an Environmental DNA Assay for the Imperiled Sunflower Sea Star (Pycnopodia helianthoides)

Gold, Z.; Robinson, K. M.; Gehman, A.-L. M.; Shea, M. M.; Lemay, M. A.; Weinrich, J.; Kellogg, C. T. E.; Clemente-Carvalho, R. B. G.; Schiebelhut, L. M.; Boehm, A. B.; Kidd, A.; Kim, A.; Hodin, J.; Dawson, M.; McAllister, S. M.

2026-05-12 molecular biology 10.64898/2026.05.07.723600 medRxiv

Top 0.1%

1.7%

Show abstract

The sunflower sea star (Pycnopodia helianthoides) suffered a catastrophic population decline across its range from 2013 to 2017 due to the devastating Vibrio pectenicida FHCF-3 driven sea star wasting disease (SSWD) pandemic with minimal signs of population recovery. The functional extinction of this apex predator across substantial parts of its range has created a need to identify and track the remaining intact populations. Environmental DNA (eDNA) approaches provide a simple, cost-effective, and non-destructive method for monitoring occurrences, and in some cases abundances, of marine species, consistently outperforming visual occurrence monitoring efforts in sensitivity, speed, and cost. Here, we designed, developed, and validated a P. helianthoides-specific eDNA assay to identify refugia, using both quantitative and digital droplet PCR approaches. We first generated the most comprehensive sea star mitochondrial genome reference database to date (n=93 taxa, n= 15 novel). We then used unikseq and Geneious bioinformatics software to identify the unique nad5 gene region and design a highly specific hydrolysis probe-based PCR assay. We validated the performance of this assay through laboratory, mesocosm, and field testing, demonstrating a highly specific and sensitive assay. In a field application of the new assay across regions in British Columbia, Canada, we found a positive correlation between P. helianthoides eDNA concentrations and biomass density, especially when appropriately accounting for spatiotemporal integration scales (R2=0.67). The eDNA assay provides a rapid and scalable tool for monitoring the sunflower sea star which has been proposed for listing as threatened under the U.S. Endangered Species Act of 1973. Molecular tools like the one presented here enhance management and recovery efforts not only by identification and monitoring of remnant wild populations, but also by helping to assess population level response and recovery following reintroduction efforts.

13

Enhancing the Understanding of Environmental Microbiomes through Topic Modeling: A Quantitative and Qualitative Analysis

Kujat, A. S.; Hassenrück, C.; Lüdtke, S.; Labrenz, M.; Sperlea, T.

2026-05-01 ecology 10.64898/2026.04.28.721390 medRxiv

Top 0.1%

1.7%

Show abstract

BackgroundUnderstanding ecosystem dynamics is essential for assessing ecosystem health, yet remains challenging due to complex biotic and abiotic interactions. Microbial communities are valuable indicators of environmental change, but the high dimensionality of microbiome data requires advanced analytical methods. This study explores the use of topic modeling (TM), an unsupervised machine learning approach initially designed for text analysis, to analyze microbiome data from the dynamic Warnow Estuary on the southern Baltic Sea coast. ResultsWe applied TM to estuarine microbiome data and compared its performance to traditional dimensionality reduction methods, Principal Component Analysis (PCA) and Principal Coordinate Analysis (PCoA). Quantitative results indicate that TM performs comparably to conventional approaches in preserving ecological and functional information, and in certain aspects even superior. In addition, we show qualitatively that NNMF, a TM method, captures latent patterns in the data providing an interpretable perspective on the microbiome. In this exploratory framework, NNMF suggested five distinct sub-communities within the estuary that appear to follow a seasonal succession influenced by freshwater inflow. These sub-communities were associated with specific ranges of salinity and temperature and showed distinct taxonomic profiles, with shared characteristics across the estuarine system. ConclusionsOur findings suggest that TM is a useful tool for exploring complex environmental microbiome datasets, offering a complementary perspective that can provide additional ecological insights. TMs ability to highlight coherent microbial community patterns indicates its promise for supporting environmental monitoring and informing targeted ecosystem management in dynamic habitats, though further studies are needed to fully assess its applicability.

14

A practical evaluation of sampling filtration and preservation of environmental DNA samples for the water column monitoring.

Baussant, T.; Krolicka, A.; Kjeilen-Eilertsen, G.; Merzi, T.

2026-06-19 ecology 10.64898/2026.06.18.733101 medRxiv

Top 0.1%

1.5%

Show abstract

Offshore industry still largely relies on traditional approaches for regulatory compliance to environmental impact on the water column. Implementing environmental DNA (eDNA) workflow can offer several advantages, but early stages such as sampling and conservation of the samples require standardization and simplification before they can be routinely applied in offshore monitoring programs. In this study, we assessed the effect of several filter types (Durapore disc, Sterivex capsule and Wattera high-capacity capsule; all with 0.22 {micro}m pore size) allowing for different volume of filtration used for sampling eDNA. We also evaluated the effect of 25 days conservation of unfiltered water samples with different preservative solutions (Benzalkonium chloride -BAC, Longmires solution LONGI and a modified Longmires solution without SDS, LNoSDS) as a viable option when immediate filtration and cold storage are not possible. For downstream eDNA evaluation of filter types and preservation, we used quantitative digital PCR on selected target DNA and metabarcoding for qualitative assessment of marine prokaryotic and eukaryotic communities. Overall, filter choice had relatively less effects on quantitative and qualitative information from eDNA compared with water preservation. Sterivex and Durapore were better filter choices for biodiversity assessment. While the Wattera filter allowed processing of larger water volumes and improved quantification of metazoan DNA, handling and processing were more challenging. For water conservation, LNoSDS was the best option. Chemical agents of LONGI and BAC may provide favourable substrates for some tolerant bacterial strains, altering the microbial community composition, with consequences for the overall qualitative evaluation of conserved eDNA. For targeted metazoan eDNA, however, chemical preservation showed clear benefits. This research highlights key considerations and viable options for eDNA sampling and simple preservation workflows without cold storage for implementation in offshore water column monitoring. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=141 SRC="FIGDIR/small/733101v1_ufig1.gif" ALT="Figure 1"> View larger version (50K): org.highwire.dtl.DTLVardef@22a175org.highwire.dtl.DTLVardef@1960864org.highwire.dtl.DTLVardef@1010f49org.highwire.dtl.DTLVardef@92a2f6_HPS_FORMAT_FIGEXP M_FIG C_FIG HighlightsO_LINeed for standardization of eDNA workflow for offshore water column monitoring C_LIO_LIImportance of eDNA sampling (filters) and eDNA conservation (preservatives) C_LIO_LIFilter choice does not affect drastically the dominant eDNA communities C_LIO_LIConservation outside cold storage challenging for eDNA-based biodiversity evaluation C_LIO_LIViable options: Sterivex filter for sampling; Longmires (no SDS) for conservation C_LI

15

A novel long-amplicon rpoB primer pair for high resolution microbiome analysis at the species-level

Venbrux, M.; Crauwels, S.; Rediers, H.

2026-05-17 molecular biology 10.64898/2026.05.15.725465 medRxiv

Top 0.1%

1.3%

Show abstract

The 16S rRNA gene is the most widely used genetic marker for microbial community profiling, but its limited sequence divergence often prevents species-level identification. The RNA polymerase {beta}-subunit gene (rpoB) offers higher sequence variability, single-copy occurrence, and stronger phylogenetic consistency, yet its adoption in metataxonomic studies has been constrained by the lack of universal primer sets. Here, we present a novel universal primer pair that amplifies an [~]1,800 bp rpoB region (rpoB_MV) compatible with long-read sequencing platforms. In silico evaluation across 17683 bacterial reference genomes demonstrated high universality, with over 86% of genomes predicted to amplify. Compared with full-length and partial 16S rRNA gene markers, the rpoB_MV amplicon exhibited significantly greater inter-species sequence divergence and improved phylogenetic concordance with core-genome trees. Sequencing of two complementary mock communities confirmed superior species-level identification accuracy, with misclassification rates below 0.01% and no reads assigned to unresolved species clusters. These results establish rpoB_MV as a robust alternative to 16S rRNA gene-based profiling for high-resolution metataxonomic applications. IMPORTANCEMicrobial community studies increasingly require species-level resolution because species within the same genus can differ substantially in pathogenicity, ecological function, and metabolic capacity. Current 16S rRNA gene-based methods frequently fail to distinguish closely related species, collapsing biologically distinct organisms into the same taxonomic assignment and obscuring community differences that matter for clinical diagnostics, food safety, and environmental monitoring. The rpoB_MV primer pair presented here overcomes this limitation by targeting a longer, more variable region of the rpoB gene, enabling accurate species-level identification across diverse bacterial phyla. Combined with advances in long-read sequencing, this approach provides researchers with a practical tool to resolve microbial communities at the species-level.

16

The old pipe gives the sweetest smoke: A phylogenetic turn for eDNA metabarcoding

Haderle, R.; Ung, V.; Jung, J.-L.

2026-06-15 genetics 10.64898/2026.06.11.731524 medRxiv

Top 0.2%

1.1%

Show abstract

Environmental DNA (eDNA) metabarcoding has transformed biodiversity monitoring, yet most analyses rely on taxonomic metrics that are sensitive to methodological variation and limit cross-study comparability. We propose a "phylogenetic turn" in eDNA analysis through the integration of phylogenetic diversity (PD) metrics. By incorporating evolutionary relationships, PD reduces dependence on species-level resolution, increases robustness to detection biases, and better captures the evolutionary "option value" of biodiversity. We synthesize key PD metrics across richness, divergence, and regularity, emphasizing the use of standardized effect sizes (SES) for ecological interpretation while addressing challenges in metric selection. We apply this framework to five marine eDNA datasets (2021-2025) spanning ecologically and geographically contrasting ecosystems, from tropical to Arctic regions, and encompassing a wide gradient of anthropogenic pressure. Across datasets, we identify consistent patterns: anthropized ecosystems exhibit high taxonomic richness but reduced phylogenetic diversity, indicating phylogenetic clustering, whereas less disturbed systems show lower richness but greater evolutionary breadth. These findings demonstrate that PD reveals ecological structure not captured by taxonomic metrics, including signatures of environmental filtering and community assembly processes. By providing a reproducible analytical workflow based on standardized eDNA datasets, we position phylogenetic diversity as a critical bridge between eDNA data and conservation frameworks. Ultimately, eDNA-based phylogenetic approaches open new avenues for decoding global biodiversity patterns across heterogeneous ecosystems.

17

Mitag4taxa: Extracting SSU rRNA Illumina reads from metagenomes for taxonomic classification

He, Y.; Du, Y.; Nguyen, L.; Wang, Y.

2026-05-05 bioinformatics 10.64898/2026.05.01.722230 medRxiv

Top 0.2%

1.1%

Show abstract

The prevailing taxonomic profiling methods for an environmental sample rely heavily on PCR amplification of SSU ribosomal RNA (rRNA) genes and genome-based reference databases. Identification and extraction of Illumina metagenomics sequencing data are PCR independent but technically challenging in recognition of the SSU rRNA fragments. Here we present Mitag4taxa, a computational pipeline designed for taxonomic profiling of microbial communities from metagenomic Illumina sequencing reads containing rRNA tags (mitag). A Hidden Markov Model (HMM) of SSU rRNA genes and those for the V4 region of 16S rRNA and the V9 region of 18S rRNA genes were created, respectively, using the representative sequences of different families and corresponding hypervariable regions in the SILVA database. The pipeline identifies and extracts 16S and 18S rRNA gene fragments along with the quality score from metagenomic or metatranscriptomic datasets using HMM search integrated with the models. The hypervariable regions, including the V4 region of 16S rRNA and the V9 region of 18S rRNA genes, can be further scanned and recruited for taxonomic classification and biodiversity estimate. To demonstrate its high reliability, the performance of Mitag4taxa was evaluated using both real and simulated datasets. In human gut metagenomic assessments, taxonomic profiles derived from Mitag4taxa showed high consistency with those based on conventional 16S rRNA gene amplicons, identifying dominant families such as Bacteroidaceae and Prevotellaceae with similar relative abundances. Statistical analyses confirmed highly significant positive correlations between Mitag4taxa and amplicon-based community structures. The 18S V9 module was further validated using shotgun metagenomic data from deep-sea sediment cores, successfully recovering key eukaryotic taxa such as Collodaria and Leotiomycetes. Furthermore, benchmarking against the RiboTagger software using CAMI marine simulated datasets revealed that Mitag4taxa achieved a higher average F1 score and lower error metrics. Overall, Mitag4taxa provides a complementary rRNA gene amplicon- and genome-independent strategy for microbial community profiling, enabling improved detection of both prokaryotic and eukaryotic taxa from metagenomic and metatranscriptomic sequencing data.

18

Taxonomic profilers and their influence on metagenomic diversity analyses

Rondeau-Leclaire, J.; Blanchet, G.; Jacques, P.-E.; Laforest-Lapointe, I.

2026-05-30 bioinformatics 10.64898/2026.05.27.727884 medRxiv

Top 0.2%

1.0%

Show abstract

Estimating taxonomic profiles is a central task in microbiome research. Several bioinformatic tools have been developed for this purpose, differing in algorithmic strategy, reference database flexibility, sensitivity parameters, and the type of abundance they estimate. As a result, taxonomic profiles carry an unwanted methodological signal whose driving characteristics remains understudied. While benchmarks have evaluated the performance of some of these tools, they rely on simulated data; little work has been done to compare them using real metagenomes in the presence of noise and uncharacterised diversity. Overall, the impact of taxonomic profiler choice and parameterisation on scientific conclusions remains poorly understood. Here, we provide a much-needed characterisation of four taxonomic profilers to help researchers better understand the available bioinformatic tools and inform their methodological choices. Then, we leverage 1,211 shotgun metagenomes from eight datasets to compare these taxonomic profilers across 13 methodological designs. Based on diversity indices, we found substantial variability in estimated taxonomic composition depending on methodological features such as reference database and algorithmic strategy. Alpha diversity and its analysis varied substantially with tool choice (particularly among k-mer-based tools) and with reference database. Beta diversity showed sensitivity to both database and parameter choices, yet this variability barely affected statistical inference. This work raises awareness about the causes of variability in metagenome analysis attributable to choices in taxonomic profiling methodology. Our findings highlight the sensitivity of taxonomic diversity analyses to these choices and the importance for researchers to consider assessing the robustness of their results to choice of tool, parameter, and reference database. Crucially, differences in sample diversity across methodologies are symptomatic of differences in estimated taxonomic composition, which can affect any analysis based on taxonomic abundances. Overall, this study underscores the importance of tool selection and parametrisation, and of conducting sensitivity analyses to support robust and reliable scientific conclusions.

19

Rapid shallow-water saturation and deep-water expansion of an invasive freshwater ecosystem engineer in a deep European lake

Hofstetter, L.; Mueller, T. M.; Bourqui, M.; Burlakova, L. E.; Cristante, Z. C.; Karatayev, A. Y.; Kessler, S.; Narwani, A.; Santos, J. L.; Sturm, L.; Wellauer, N.; Spaak, P.; Weber, A. A.-T.

2026-06-27 ecology 10.64898/2026.06.26.734794 medRxiv

Top 0.2%

1.0%

Show abstract

Quagga mussels (Dreissena rostriformis bugensis) are ecosystem engineers that can alter nutrient cycling, benthic-pelagic coupling, and food-web structure in deep lakes. Although their invasion trajectories are well documented in the Laurentian Great Lakes in North America, depth-specific population dynamics remain poorly resolved in recently invaded European perialpine lakes. We analyzed five annual lake-wide surveys (2021-2025) from 54 stations spanning 2.4-253 m depth in Lake Constance to quantify changes in quagga mussel density, biomass, and shell-length distribution. Contrary to expectations of lake-wide exponential growth, shallow-water populations (< 20 m) showed no significant increase during the study period and appear to have reached carrying capacity before monitoring began. In contrast, densities increased monotonically at intermediate depths (40-125 m), indicating ongoing expansion into deeper strata. Mean shell length declined with depth, and size distributions in shallow waters shifted toward larger individuals, consistent with a transition from active recruitment to somatic growth of established mussels. Compared with the Laurentian Great Lakes, Lake Constance already has substantially higher shallow-water biomass, whereas deeper invasion trajectories are broadly similar. These results show that quagga mussel invasion in deep European lakes can combine rapid littoral saturation with slower profundal expansion, complicating direct transfer of predictions from the Great Lakes. Continued depth-stratified monitoring will be essential for anticipating future ecosystem effects in perialpine lakes.

20

Charting the insect biodiversity of Crete: insights from a pilot metabarcoding survey

Koutsovoulos, G. D.; Sorg, M.; Hörren, T.; Buchner, D.; Bourlat, S. J.; Langen, K.; Trichas, A.; Leese, F.; Stamatakis, A.

2026-06-08 ecology 10.64898/2026.06.05.730060 medRxiv

Top 0.2%

0.9%

Show abstract

Among eukaryotes, insects are by far the most diverse organisms on Earth, yet their global decline threatens ecosystem stability. Understanding local and regional biodiversity patterns is critical for conservation planning, ecosystem management, and predicting responses to environmental change, but traditional surveys for assessing insect diversity (e.g., manual collection, morphological identification, and counting) are highly labor-intensive, time-consuming, and often require rare or simply unavailable dedicated taxonomic expertise. DNA metabarcoding offers an efficient, high-resolution alternative to assess insect communities. Here, we report on the first insect metabarcoding survey on Crete that spans two years of sample collection between 2021 and 2023 from a small area in Southern Central Crete in the context of a citizen science project. A total of 29 samples yielded 10,865 Exact Sequence Variants (ESVs), 10,516 of which were assigned to insects, covering 988 species, 900 genera, and 227 families across 14 orders. A comparison with the existing observation records reveals 406 potential newly-observed species and an estimated 690 unclassified species, indicating substantial cryptic diversity. Our results demonstrate that even small-scale sampling can unravel substantial insect diversity and highlight critical gaps in barcode reference databases. Our study demonstrates how DNA metabarcoding can accelerate biodiversity discovery and monitoring in understudied regions.